Quality Assessment of Linked Datasets Using Probabilistic Approximation
نویسندگان
چکیده
With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.
منابع مشابه
Approximation Methods for Solving the Equitable Location Problem with Probabilistic Customer Behavior
Location-allocation of facilities in service systems is an essential factor of their performance. One of the considerable situations which less addressed in the relevant literature is to balance service among customers in addition to minimize location-allocation costs. This is an important issue, especially in the public sector. Reviewing the recent researches in this field shows that most of t...
متن کاملLuzzu Quality Metric Language - A DSL for Linked Data Quality Assessment
The steadily growing number of linked open datasets brought about a number of reservations amongst data consumers with regard to the datasets’ quality. Quality assessment requires significant effort and consideration, including the definition of data quality metrics and a process to assess datasets based on these definitions. Luzzu is a quality assessment framework for linked data that allows d...
متن کاملImproving Curated Web-Data Quality with Structured Harvesting and Assessment
This paper describes a semi-automated process, framework and tools for harvesting, assessing, improving and maintaining high-quality linked-data. The framework, known as DaCura1, provides dataset curators, who may not be knowledge engineers, with tools to collect and curate evolving linked data datasets that maintain quality over time. The framework encompasses a novel process, workflow and arc...
متن کاملAssessing Quantity and Quality of Links Between Link Data Datasets
The Linked Data Web is growing and it becomes increasingly necessary to analyze the relationship between datasets to exploit its full value. LOD datasets can range from datasets with low cohesion – containing data from different Fully Qualified Domain Names (FQDN) and namespaces – to highly cohesive datasets. This paper evaluates the quantity and quality of links between distributions, datasets...
متن کاملProbabilistic Seismic Hazard Assessment of Tehran Based on Arias Intensity
A probabilistic seismic hazard assessment in terms of Arias intensity is presented for the city of Tehran. Tehran is the capital and the most populated city of Iran. From economical, political and social points of view, Tehran is the most significant city of Iran. Many destructive earthquakes happened in Iran in the last centuries. Historical references indicate that the old city of Rey and the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015